9 research outputs found
Retrieving Ambiguous Sounds Using Perceptual Timbral Attributes in Audio Production Environments
For over an decade, one of the well identified problem within audio production environments is the effective retrieval and management of sound libraries. Most of the self-recorded and commercially produced sound libraries are usually well structured in terms of meta-data and textual descriptions and thus allowing traditional text-based retrieval approaches to obtain satisfiable results. However, traditional information retrieval techniques pose limitations in retrieving ambiguous sound collections (ie. sounds with no identifiable origin, foley sounds, synthesized sound effects, abstract sounds) due to the difficulties in textual descriptions and the complex psychoacoustic nature of the sound. Early psychoacoustical studies propose perceptual acoustical qualities as an effective way of describing these category of sounds [1]. In Music Information Retrieval (MIR) studies, this problem were mostly studied and explored in context of content-based audio retrieval. However, we observed that most of the commercial available systems in the market neither integrated advanced content-based sound descriptions nor the visualization and interface design approaches evolved in the last years.
Our research was mainly aimed to investigate two things; 1. Development of audio retrieval system incorporating high level timbral features as search parameters. 2. Investigate user-centered approach in integrating these features into audio production pipelines using expert-user studies. In this project, We present an prototype which is similar to traditional sound browsers (list-based browsing) with an added functionality of filtering and ranking sounds by perceptual timbral features such as brightness, depth, roughness and hardness. Our main focus was on the retrieval process by timbral features. Inspiring from the recent focus on user-centered systems ([2], [3]) in the MIR community, in-depth interviews and qualitative evaluation of the system were conducted with expert-user in order to identify the underlying problems. Our studies observed the potential applications of high-level perceptual timbral features in audio production pipelines using a probe system and expert-user studies. We also outlined future guidelines and possible improvements to the system from the outcomes of this research
MEL: a music entity linking system
Comunicaci贸 presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina.In this work, we present MEL, the first Music Entity Linking system. MEL is able to identify mentions of musical entities (e.g., album, songs, and artists) in free text, and disambiguate them to a music knowledge base, i.e., MusicBrainz. MEL combines different state-of-the-art libraries and SimpleBrainz, an RDF knowledge base created from MusicBrainz after a simplification process. MEL is released as a REST API and as an online demo Web.This work was partially funded by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)
MEL: a music entity linking system
Comunicaci贸 presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina.In this work, we present MEL, the first Music Entity Linking
system. MEL is able to identify mentions of musical
entities (e.g., album, songs, and artists) in free text,
and disambiguate them to a music knowledge base, i.e.,
MusicBrainz. MEL combines different state-of-the-art libraries
and SimpleBrainz, an RDF knowledge base created
from MusicBrainz after a simplification process. MEL is
released as a REST API and as an online demo Web.This work was partially funded by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502)
Essentia API: a web API for music audio analysis
We present Essentia API, a web API to access a collection of state-of-the-art music audio analysis and description algorithms based on Essentia, an open-source library
and machine learning (ML) models for audio and music analysis. We are developing it as part of a broader
project in which we explore strategies for the commercial viability of technologies developed at Music Technology Group (MTG) following open science and open
source practices, which involves finding licensing schemes
and building custom solutions. Currently, the API supports music auto-tagging and classification algorithms (for
genre, instrumentation, mood/emotion, danceability, approachability, and engagement), and algorithms for musical key, tempo, loudness, and many more. In the future, we
envision expanding it with new machine learning models
developed by the MTG and our collaborators to facilitate
their access for a broader community of users
Essentia.js: a JavaScript library for music and audio analysis on the web
Comunicaci贸 presentada a: International Society for Music Information Retrieval Conference celebrat de l'11 al 16 d'octubre de 2020 de manera virtual.Open-source software libraries for audio/music analysis
and feature extraction have a significant impact on the development
of Audio Signal Processing and Music Information
Retrieval (MIR) systems. Despite the abundance
of such tools on the native computing platforms, there is
a lack of an extensive and easy-to-use reference library
for audio feature extraction on the Web. In this paper,
we present Essentia.js, an open-source JavaScript (JS) library
for audio and music analysis on both web clients
and JS-based servers. Along with the Web Audio API, it
can be used for efficient and robust real-time audio feature
extraction on the web browsers. Essentia.js is modular,
lightweight, and easy-to-use, deploy, maintain and integrate
into the existing plethora of JS libraries and Web
technologies. It is powered by a WebAssembly back-end
of the Essentia C++ library, which facilitates a JS interface
to a wide range of low-level and high-level audio features.
It also provides a higher-level JS API and add-on MIR utility
modules along with extensive documentation, usage examples,
and tutorials. We benchmark the proposed library
on two popular web browsers, Node.js engine, and Android
devices, comparing it to the native performance of
Essentia and Meyda JS library
Audio and music analysis on the web using Essentia.js
Open-source software libraries have a significant impact on the development of Audio Signal Processing and Music Information Retrieval (MIR) systems. Despite the abundance of such tools, there is a lack of an extensive and easy-to-use reference library for audio feature extraction on Web clients.
In this article, we present Essentia.js, an open-source JavaScript (JS) library for audio and music analysis on both web clients and JS engines. Along with the Web Audio API, it can be used for both offline and real-time audio feature extraction on web browsers. Essentia.js is modular, lightweight, and easy-to-use, deploy, maintain, and integrate into the existing plethora of JS libraries and web technologies. It is powered by a WebAssembly back end cross-compiled from the Essentia C++ library, which facilitates a JS interface to a wide range of low-level and high-level audio features, including signal processing MIR algorithms as well as pre-trained TensorFlow.js machine learning models. It also provides a higher-level JS API and add-on MIR utility modules along with extensive documentation, usage examples, and tutorials. We benchmark the proposed library on two popular web browsers and the Node.js engine, and four devices, including mobile Android and iOS, comparing it to the native performance of Essentia and the Meyda JS library.The work on Essentia.js has been partially funded by the Ministry of Science and Innovation of the Spanish Government under the grant agreement PID2019-111403GB-I00 (Musical AI)
Da-TACOS: A dataset for cover song identification and understanding
Comunicaci贸 presentada a: 20th annual conference of the International Society for Music Information Retrieval (ISMIR) celebrat del 4 al 8 de novembre de 2019 a Delft, Pa茂sos Baixos.This paper focuses on Cover Song Identification (CSI),
an important research challenge in content-based Music
Information Retrieval (MIR). Although the task itself is
interesting and challenging for both academia and industry scenarios, there are a number of limitations for the
advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited,
and there is no publicly available benchmark set that is
widely used among researchers for comparative algorithm
evaluation. Second, most of the algorithms are not publicly shared and reproducible, limiting the comparison of
approaches. To overcome these limitations we propose
Da-TACOS, a DaTAset for COver Song Identification and
Understanding, and two frameworks for feature extraction
and benchmarking to facilitate reproducibility. Da-TACOS
contains 25K songs represented by unique editorial metadata plus 9 low- and mid-level features pre-computed with
open source libraries, and is divided into two subsets. The
Cover Analysis subset contains audio features (e.g. key,
tempo) that can serve to study how musical characteristics vary for cover songs. The Benchmark subset contains
the set of features that have been frequently used in CSI research, e.g. chroma, MFCC, beat onsets etc. Moreover, we
provide initial benchmarking results of a selected number
of state-of-the-art CSI algorithms using our dataset, and
for reproducibility, we share a GitHub repository containing the feature extraction and benchmarking frameworks.This work is partially supported by the MIP-Frontiers
project, the European Union鈥檚 Horizon 2020 research and
innovation programme under the Marie Sk艂odowska-Curie
grant agreement No. 765068, and by TROMPA, the Horizon 2020 project 770376-2
Da-TACOS: A dataset for cover song identification and understanding
Comunicaci贸 presentada a: 20th annual conference of the International Society for Music Information Retrieval (ISMIR) celebrat del 4 al 8 de novembre de 2019 a Delft, Pa茂sos Baixos.This paper focuses on Cover Song Identification (CSI),
an important research challenge in content-based Music
Information Retrieval (MIR). Although the task itself is
interesting and challenging for both academia and industry scenarios, there are a number of limitations for the
advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited,
and there is no publicly available benchmark set that is
widely used among researchers for comparative algorithm
evaluation. Second, most of the algorithms are not publicly shared and reproducible, limiting the comparison of
approaches. To overcome these limitations we propose
Da-TACOS, a DaTAset for COver Song Identification and
Understanding, and two frameworks for feature extraction
and benchmarking to facilitate reproducibility. Da-TACOS
contains 25K songs represented by unique editorial metadata plus 9 low- and mid-level features pre-computed with
open source libraries, and is divided into two subsets. The
Cover Analysis subset contains audio features (e.g. key,
tempo) that can serve to study how musical characteristics vary for cover songs. The Benchmark subset contains
the set of features that have been frequently used in CSI research, e.g. chroma, MFCC, beat onsets etc. Moreover, we
provide initial benchmarking results of a selected number
of state-of-the-art CSI algorithms using our dataset, and
for reproducibility, we share a GitHub repository containing the feature extraction and benchmarking frameworks.This work is partially supported by the MIP-Frontiers
project, the European Union鈥檚 Horizon 2020 research and
innovation programme under the Marie Sk艂odowska-Curie
grant agreement No. 765068, and by TROMPA, the Horizon 2020 project 770376-2